Conventional sorting algorithms are designed to sort character strings (e.g., words, phrases, names, etc.) alphabetically according to the characters within the strings. However, in some languages, non-character symbols or marks are often added to characters to modify the pronunciation of the characters or the string as a whole. One common type of pronunciation modifier is an accent. Accents are common in many non-English languages, such as Danish, Latin, German, and Japanese.
Computerized sorting routines have a drawback in that they may mishandle character strings that contain a combination of accented and unaccented characters. Consider the Japanese case. The Japanese language includes three character sets: Kanji, Hiragana, and Katakana. The latter two character sets--Hiragana and Katakana--are collectively known as Kana characters. Kana characters include special accented characters known as "dakuten" and "handakuten" characters.
In each of the Hiragana and Katakana character sets, there are twenty dakuten characters and five handakuten characters. Dakuten characters appear identical to a companion set of Kana characters except for a small double slash accent that appears in the upper right hand corner of the character. Handakuten characters appear identical to five of the dakuten characters except for replacing the small double slash accent with a small circle accent.
Conventional sorting routines are effective at sorting Kanji-only character strings and Kana-only character strings. However, problems arise when Kanji and Kana characters are mixed in the string. The sorting routines give more weight to differences between Kanji characters in two character strings than that of dakuten and handakuten characters. As a result, the sorting routines often yield strings that are ordered incorrectly and not reflecting how such character strings would appear in a Japanese dictionary or telephone book.
Accordingly, there is a need to improve processes for sorting accented characters. In the Japanese case, the goal is to sort the strings identically to how they would be listed in a Japanese dictionary or telephone book.